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Abstract — Numerous applications in signal processing have 
benefited from the theory of compressed sensing which shows that 
it is possible to reconstruct signals sampled below the Nyquist 
rate when certain conditions are satisfied. One of these conditions 
is that there exists a known transform that represents the signal 
with a sufficiently small number of non-zero coefficients. However 
when the signal to be reconstructed is composed of moving 
images or volumes, it is challenging to form such regularization 
constraints with traditional transforms such as wavelets. In this 
paper, we present a motion compensating prior for such signals 
that is derived directly from the optical flow constraint and 
can utilize the motion information during compressed sensing 
reconstruction. Proposed regularization method can be used in 
a wide variety of applications involving compressed sensing and 
images or volumes of moving and deforming objects. It is also 
shown that it is possible to estimate the signal and the motion 
jointly or separately. Practical examples from magnetic resonance 
imaging has been presented to demonstrate the benefit of the 
proposed method. 



I. Introduction 

System inversion problems in imaging has been extensively 
studied for decades and the developments in compressed 
sensing approach in the last decade has provided a new per- 
spective and opportunities for solution of these problems. By 
making use of sparse reconstruction principles, it is shown that 
compressed sensing enables reliable recovery of signals even 
if system response is measured at a rate below the Nyquist 
rate [1]. Numerous fields has benefited from this discovery 
especially if data acquisition is limited due to constraints. 

Medical imaging is one of the application areas that adopted 
compressed sensing principles rather quickly. It is shown that 
in magnetic resonance imaging (MRI), the number of mea- 
surement samples and thus the scan time can be reduced while 
preserving image quality if compressed sensing principles are 
used [2]. This is further improved by parallel imaging with 
algorithms such as SparseSENSE utilizing multiple coils [3]. 
Although acquisition of compressed measurements are not as 
straightforward as MRI in X-ray computer tomography (CT), 
compressed sensing is shown to help for partial reconstruction 
from a subset of measurements [4]. Photo-acoustic imaging 
is another application for which compressed sensing improves 
reconstructed image quality [5]. These initial studies with com- 
pressed sensing were on still images or volumes and spatial 
total variation (TV) or wavelets were used as regularization 
constraints. 



Dynamic imaging is shown to benefit from compressed 
sensing even further due to the images being significantly 
correlated along temporal dimension and therefore represented 
with sparse temporal transforms. Similarly to the spatial case, 
temporal TV and wavelets are commonly used in dynamic 
CT and MRI [6], [7]. The temporal Fourier transform is also 
utilized with k-t SparseSENSE [8] especially with cardiac 
MRI, due to the cardiac motion being periodic therefore sparse 
with respect to Fourier transform. K-t Group Sparse SENSE 
is introduced by Usman et al. [9] which groups pixels with 
respect to their spatio temporal activity to treat static and dy- 
namic regions differently during reconstruction. Alternatively 
Kalman filter is utilized by Vaswani [10] providing quick se- 
quential reconstruction of images in time. However when there 
is motion in the observed object such as in cardiac imaging or 
due to patient motion during the acquisition, these temporal 
regularization methods do not always sufficiently sparsify the 
signal. Therefore methods compensating for motion have been 
investigated. 

The compression of images or volumes of moving objects 
in time is a well-investigated field and many efficient video 
compression algorithms exist today such as H.264/AVC to 
exploit temporal dependence in video. The common approach 
in video compression is sequentially estimating each frame 
from available reference frames by estimating the motion in 
between. Considering the high compression achieved by video 
compression methods, there are algorithms that approach the 
compressed sensing reconstruction problem similarly. Some 
basic methods using information from an existing reference 
frame during reconstruction are presented in [II]. This is 
further improved by Jung et al. utilizing multiple reference 
frames and motion compensation in k-t FOCUSS [12], [13] 
or other works such as [14]. One of the drawbacks of these 
reference frame based reconstructions is that, unlike the video 
compression framework, having good quality reference frames 
in compressed sensing is not realistic in practice where the 
entire signal is reconstructed simultaneously not sequentially. 
In this paper we propose a motion compensating prior that 
significantly sparsifies multidimensional signals that involves 
motion therefore enabling high quality compressed sensing 
reconstruction. The proposed prior is a natural extension 
of the optical flow constraint and can be considered as a 
generalization of temporal TV. The motion is estimated either 
separately from or jointly with the signal. The proposed prior 
can be used with a variety of compressed sensing applications 



that involve moving objects. 

The paper is organized as follows; the formulation of com- 
pressed sensing reconstruction and some recent approaches 
are summarized in Section II, the proposed reconstruction 
method is presented in Section III, some experimental results 
with dynamic MRI are reported in Section IV and finally the 
conclusions and future work are discussed in Section V. 

II. Compressed Sensing Reconstruction 

Compressed sensing can be formulated as a system inver- 
sion problem in general with 



y = Hx + n 



(1) 



where y is the observation or measurement vector, x is the 
signal to be reconstructed, n is the additive noise and H is 
a matrix defining the linear operations on x. H can take on 
different forms depending on the application at hand but it is 
assumed to be ill conditioned and therefore there are infinitely 
many solutions for x. Compressed sensing shows that the 
signal x can be recovered almost certainly if [1], [2], 

1) ^x is sufficiently sparse for a known transform * 
(v|/'^ = \I>^' = I, .' indicating the conjugate transpose) 

2) H and \I/ are sufficiently incoherent, i.e. the off diagonal 
entries of 'S'H'HnI/' after normalization are sufficiently 
small 

One of the breakthroughs in compressed sensing is that the 
2nd condition is shown to be satisfied when H is formed by 
random entries, or by a random subset of rows of a full rank 
matrix regardless of 'J/. The 1st condition is also relaxed such 
that ^ can be an overcomplete transform or dictionary or a 
penalty operator as in case of total variation [15], [16]. Under 
these conditions it is shown that x can be recovered almost 
certainly with the minimization 



w — argmm ||w|[o s.t. 

w 

X = \I/'w 



H*'w 2 < e 



(2) 
(3) 



in which 



is the Lo-norm (which is in fact a quasi-norm) 



that is the number of non-zero entries of a vector ( x L = 



i/p 



< e 



^x^ j for p > 0). The constraint ||y — H^'w||2 

ensures consistency with the measurements and it is optimum 
for i.i.d. Gaussian noise with standard deviation e, although 
it has been shown that it works fairly well for other noise 
distributions such as Poisson. The minimization in (2) is 
non-convex and therefore practical methods do not guarantee 
convergence to global minimum. It is shown in the sparse 
reconstruction literature that the minimization of ii-norm as 
in 



w = argmm |[w|[i s.t. 

w 

X = \I/'w 



H*'w|2 <e 



(4) 
(5) 



will lead to the same solution as in (2) provided that w is 
sufficiently sparse [17]. The implications of this equivalence 
is significant since (4) is a convex optimization problem with 
a global minimum and can be solved very efficiently with 



methods such as Alternating Direction Method of Multipliers 
(ADMM) [18], [19] or C-SALSA [20]. In order to further 
simplify the minimization, the constraint in (4) can be removed 
to form the unconstrained formulation 

w = argminA|lw|li + -|ly - H\l>'w||2 (6) 

X = *'w (7) 

A number of fast minimization algorithms exist for the so- 
lution of unconstrained minimization in (6) such as TwIST 
[21], FISTA [22], ADMM and SALSA [23] and the resuh is 
equivalent to (4) if A is adjusted properly. 

The synthesis prior formulation in (4) and (6) is used with 
overcomplete transforms such as wavelets or non-orthogonal 
dictionaries. An alternative approach uses an analysis prior 
formulation with a penalty term for regularization as in 

1, 



X == arg min A_R(x) H — ||y 

X ^ 



Hxll 



(8) 



in which -R(x) is the penalty function that is large when x 
has characteristics different from a prior knowledge of x. A 
commonly used example for such penalty functions is the total 
variation (TV) which is defined for ID signals as 



TV(x) 



Z^ 



{•^i -^i—l I 



(9) 



and penalizes the signal gradient. Let us define our signal as 
concatenation of vectors of consecutive images (or volumes), 
Xi, such that 

x^ixf-.-x^J^ (10) 

where .^ represents non-conjugate transpose and rit is the 
number of frames. Therefore the total variation in temporal 
dimension penalizing the temporal gradient can be defined as 



TVt(x) ^ ^ ||xj -Xi_i||i 



I 
I I 



-I I 



Xl 



= IIDtxIl 



(11) 



in which Dt is the temporal difference matrix. Minimization of 
(8) with -R(x) = TVt(x) is possible with algorithms such as 
MFISTA [22] or ADMM, and can be sufficient for signals with 
insignificant motion or large static areas between consecutive 
frames. However when signals have substantial motion, the 
frame difference may not be sufficiently sparse. 

In order to improve the sparsity of regularization and 
therefore the accuracy of reconstruction when dealing with 
signals with motion, alternative reconstruction methods that 
make use of motion compensation have been proposed [11], 
[12], [13]. The common approach in these methods is using 
motion compensation to create an initial estimate for the 
signal from available data and then using this estimate as 
a prior during the reconstruction. A simple example to this 
approach is estimating a number of frames between two 



reference frames by motion interpolation assuming the motion 
in between is linear [13]. The estimate for these frames 



[Xe 



■Xpl^l^ can then be used to create a more 



sparse error signal Xr ~ x — Xe, and the entire signal can 
be reconstructed by the minimization 

Xr = argmin Ai?(xr) H — 

where y — y — Hxg 



||y - HXr 



(12) 



X = Xg + Xr (13) 

in which Xg is constant. _R(.) is either just the Li-norm 
(assuming Xr is sufficiently sparse) or a penalty function such 
as TV. Various methods to estimate Xg have been proposed in 
[11], [12], [13], most of which makes use of reference frames 
or low resolution reconstruction of the frames. However this 
approach may be inefficient for mainly two reasons: 

1) Inaccurate Xg.- Unlike video compression, accurate ref- 
erence frames are often not present in compressed 
sensing and therefore Xg may not be accurate enough 
to result in a Xr with sufficient sparsity. 

2) Signal-to-noise ratio (SNR) penalty: Although (12) is 
very similar to (8), it can be seen that y is replaced 
with y = y — Hxg which may significantly affect the 
result unless sparsity is vastly increased to compensate 
considering the SNR of the measurements is reduced 
(E{y)/E{n) » E{y)/E{n)) [24]. 

Therefore a better utilization of motion during reconstruction 
is preferable in order to avoid these drawbacks. 

III. A Motion Compensating Prior - Motion-TV 

Let us assume that the motion vectors between the frames 
of a signal are known but not the signal itself. Without the 
reference frames the motion information would be useless for 
reconstruction in video compression whereas in compressed 
sensing this information could be quite helpful. One of the 
main assumptions for motion estimation between images is 
the optical flow constraint (OFC) 



Xi{3) = Xi^l(s + Vi{s)) 



(14) 



with s being the spatial index, which basically states that for 
a spatially continuous signal the signal intensity is constant 
along the motion path in consecutive frames. In case of 
discrete signals this equation does not exactly hold due to 
interpolation error and therefore becomes 



Ki i_iXi_i 



(15) 



where K^ i_i = K(vi) is a matrix that represents the geomet- 
ric transform and interpolation defined by the motion vectors 
, Vj, between frames i and (« — 1) and therefore K^ i_iXj_i 
is the motion compensated estimate of x^ from Xi_i. If all 
motion vectors correspond to integer pixel locations, Ki.i_i 
is simply a matrix with each row having a single non-zero 
entry equal to 1 . In the case when motion vectors point to non- 
integer locations, rows of K^ i_i can have multiple non zero 
entries depending on the interpolation method (a maximum of 
four non-zero entries for bilinear interpolation, 16 non-zero 
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Fig. 1. The pixel Xi(s) and tlie motion vector v; associated with it in Frame 



entries for bicubic interpolation, etc.) representing a weighted 
average of nearby pixel locations. An example of Ki^i_i can 
be seen in 



Ki 



-i(si) 

-l(s2) 
-1(S3) 

-l(.'4) 



(16) 
which represents the bilinear interpolation of pixel Xi(s) from 
horizontally and vertically closest pixels to Xi_i(s + Vi(s)) in 
the previous frame (xi_i(si), Xi_i(s2), Xi_i(s3), Xi_i(s4)) 
in a 2D image sequence as depicted in Figure 1. 

The inaccuracy in (15) due to the interpolation error, can 
be corrected with a residual term such as 



Xi J^i^i—l^i—1 I Xr^ 
^Xr^ — X^ Ja.^,^— iX^— 1 



(17) 
(18) 



where Ki,i_i ~ K(vi) as stated earlier, Xr^ is the residual 
signal after motion compensated estimation for ith frame. 
Reorganizing (18) we have 



^rrit. 



K 



2.1 











K 



nt,nt — l 



Xl 



Xr =Kt(v)x 



(19) 



(20) 



Assuming Xr in (19) is sparse, the signal can be recon- 
structed by the constrained minimization 



X — argmin ||Kt(v)x||i s.t. ||y — Hx||2 < e 



(21) 



It can be observed from (19) and (21) that the regularization 
term ||Kt(v)x||i is a more general form of temporal TV 
in (11) incorporating the motion information into the recon- 
struction, therefore we refer to it as "motion compensated 
total variation" or Motion-TV. Note that Xrj in (18) will be 
significantly more sparse than the frame difference x^ — Xi_i 
penalized by TV when dealing with large motion since it 
minimizes the pixel difference along the motion trajectory. The 
minimization in (21) can be carried out with an Alternating 



Direction Method of Multipliers (ADMM) based approach 
with variable separation by solving the equivalent problem 



X = argmm ||p||i 



s.t. p = Kt(v)ni, m = X, ||y — Hx||2 < e 



(22) 



The steps of the ADMM algorithm at each iteration solving 
(22) can be summarized as 



{p, m,x} ^ argmiii||p||i + ^i||p - Kt(v)m - di 

p,m,x 

+ ^2|Im-x-d2||2 
+ M3||y-Hx-s-d3||^ 
r Oif ||y-Hx-d3||l<e 

S ^ -^ 7?(y-Hx-d3) -f II pj^^ ^_||2 



di 

d2 
da 



(^ ||y-Hx-d3||2 

di-(p-Kt(v)m) 

d2 - (m - x) 

dg - (y - Hx - s) 



d3|li>e 



(23) 

(24) 

(25) 
(26) 
(27) 



in which /j,i are constants and d^ and s are intermediate 
variables. Details of the minimization algorithm are given in 
Algorithm 1 in which the function &{.) is the element wise 
soft thresolding operation defined as 



e{a,T)^ 



T — if |a|>T 
\a\ 

if lal < T 



(28) 



A detailed derivation of the ADMM approach to solve (21) is 
out of the scope of this paper; readers who are interested are 
encouraged to read [19], [18], [25] and the references therein. 
The minimization in (21) with Motion-TV provides re- 
construction without any use of reference frames and SNR 
penalty unlike (12), however it still requires motion vectors. 
It is possible to estimate these motion vectors with a separate 
optimization from an initial low quality estimate of the signal 
or jointly with the signal itself. 

A. Separate Optimization of the Signal and the Motion 

In order to estimate the motion vectors, an initial reconstruc- 
tion of the signal can be used which may suffer from artifacts 
such as temporal blurring but can still be sufficient for motion 
estimation with suitable estimation algorithms. Therefore the 
signal can be reconstructed with a 3 step algorithm as shown 
in Algorithm 2. 

In the initial reconstruction phase the signal is reconstructed 
with regularization ||Rfx||i which can be temporal TV, tem- 
poral wavelet or temporal Fourier transform (DFT), in case 
of a periodic signal. This initial reconstruction can then be 
used in the motion estimation step which can be performed 
by various registration or motion estimation algorithms almost 
all minimizing an objective function in the form as in line 3 
of Algorithm 2. The term i?(v) regularizes the motion field 
if necessary while |jKt(v)x*||2 enforces the optical flow con- 
straint as stated earlier. Note that, in order to enable accurate 
motion estimation in step 2, the initial reconstruction should 
be sufficiently accurate so that the optical flow constraint is 
satisfied. 



Algorithm 1 ADMM minimization to solve (21) 
1: procedure MotionTV_ADMM{y , H, Kt (v) , e) 

Minimizes ||p||i s.t\i — Kt(v)m, m = x, ||y— Hxljg < e 
Set fii,fi2,fJ-3 > 0, did2, da, s = 0, x = H'y, m = x 
while convergence criteria not met do 

p^ argmin||p||i + ^i|jp - Kt(v)m - di|l| 

= 6(Kt(v)m + di, 1 ^ 



2^1 



ni<— argmin /ii j|p — Kt(v)m — di||2 
+M2||m-x -i-ii^ 



= (^I + Kt(v)'Kt(v^ 



d2||2 
-1 



[Kt(v)'(p-di) 



Oif ||y- 



7i(y-Hx-d3) 

|y— Hx— dalh 



if ||y-Hx 



13 II 5 
i3||i 



> e 



X ^ argmin /i2||in — x — d 



2115 



+Ai3||y-Hx-s-d3||2 






H'H 



[H'(y - s - dg) 



M3 ^ ^' 



di 
d2 
d3 



di 

d2 

ds 

end while 
return x 
end procedure 



(P- 
(m 

(y- 



Kt(v)m) 
-x) 
Hx s) 



The algorithm for registration or motion estimation depends 
on the motion characteristics of the signal. In the simplest case, 
there can be global motion such as translational or rotational, 
which is seen in case of a moving camera in video applications 
or a moving object in medical imaging. The global motion 
parameters can be determined with iterative search algorithms 
to minimize ||Kt(v)x*||2 without the need for regularization 
(i?(v) = 0) because it is an over-complete problem with few 
unknowns. 

A harder and more interesting problem arises when there 
is localized motion as well, such as in the case of multiple 
moving objects at different directions or speeds, or objects 
changing shape non-uniformly where the motion vector at 
each pixel may differ from its neighbor in general. Estimating 
the motion field where all pixels are allowed to have dif- 
ferent motion vectors is generally referred as the deformable 
registration problem. Unlike global motion, the unknowns in 
deformable registration is multiple times the number of pixels 
(an unknown motion vector with 2 or 3 dimensions for each 
pixel) and therefore it is an ill-posed problem which can only 
be solved with the help of regularization. To reduce the number 
of unknowns and yet allow local motion, one may assume each 
block of pixels have the same motion vector, as done in video 
compression. However, this block-wise motion representation 
is not very accurate for shape deformation often encountered 
in medical images, therefore every pixel can be represented 



Hx||2 < , 



Algorithm 2 Motion-TV regularization with separate motion 

estimation 

1: procedure MotionTV_Separate(y , H, e) 

2: X* ^ argmin ||Rtx||i s.t. |[y - 

X 

> Initial Reconstruction 
3: V* ^ argminAi?(v) + |tKt(v)x*||| 

V 

> Motion Estimation / Registration 
4: X <— argmin ||Kt(v*)x||i s.t. ||y — Hx||| < e 

X 

Final Reconstruction 
5: return x 
6: end procedure 



with a different motion vector for local registration. 

Problems involving both global and local motion can be 
solved by first estimating the global motion parameters and 
then determining the local motion through deformable regis- 
tration. Global motion parameters can be robustly estimated 
even if there are local deformations or noise due to the 
over-complete nature of the problem. After correcting for the 
global motion, the local motion can be estimated by iteratively 
solving 



argmin Ai?(v 

Av 



Av) + llKt(v + Av)x|l2 (29) 



until convergence, in which Av is the small update of the 
motion field with a known step size. Many popular deformable 
registration methods such as variants of Demon's Algorithm 
[26], [27], [28] or Finite Element Model (FEM) based methods 
[29], [30] solve (29) efficiently yielding a small deformation 
of the images towards the reference images. Motion estimation 
methods used in video compression such as block matching 
can still be adjusted to estimate a motion vector for each pixel, 
however when the deformation is significant aforementioned 
deformable registration algorithms can be more accurate. The 
regularization term in these algorithms are usually selected to 
enforce the smoothness of the motion field gradient (i?(v) — 
J2 1 1 ^■^i II 2 fo'" Demon's Algorithm, combinations of first and 

i 

second spatial gradient often used for FEM based methods). 
Diffeomorphism, i.e. the condition of the geometric transform 
or motion field being smooth and invertible, can also be 
enforced to avoid physically unlikely motion fields either 
through regularization or by replacing the addition (v + Av) 
in (29) with other operations as in [28]. 

Both the global and the local motion estimation are robust 
against the artifacts presented by the initial reconstruction step 
either due to over-completeness or regularization of the motion 
field. After the registration step, the final reconstruction of the 
signal is carried out by enforcing the Motion-TV prior, using 
the estimated motion. 



B. Joint Optimization of the Signal and the Motion 

Instead of separately solving for x and v, each involving a 
number of iterations, another approach can be estimating them 



jointly by alternating their iterations such as in 

{p, m,x} ^ argmin IIpIIi + ^ijjp - Kt(v)m - di|l2 

p,m,x 

+ y^2||m-x-d2||2 

Hx-s-dall^ (30) 



A*3||y 



V + arg min A_R(v + Av) 

Av 



|Kt(v- 



Av)x||^ (31) 



Update s,di,d2,d3 



(32) 



where s, di, d2, da are updated the same as in (24), (25), (26) 
and (27). This attempt however fails miserably due to noisy 
artifacts in x during the iterations introduced by the term H'y 
such as in line 7 of Algorithm 1. H is an ill conditioned 
matrix as defined by the compressed sensing problem and the 

term H'y = H'(Hx + n) (or {j^\ + H'h") H'(Hx + n) 
as in ADMM based algorithms such as Algorithm 1) results 
in noisy artifacts (possibly along both spatial and temporal 
dimensions) due to non-zero entries off the main diagonal of 
H'H resulting from incompleteness. This term is unavoidable 
for any algorithm that solves (4), (6) or (8) since it results 
from the derivative of the data consistency term (||y — HxlH). 
In compressed sensing, this noise is iteratively filtered in 
the sparse transform domain with non-linear methods (such 
as soft-thresholding in line 5 of Algorithm 1) and often 
diminish exponentially, although not completely removed until 
convergence. Therefore estimating v from a noisy x by solving 
(31) before convergence leads to an incorrect motion field and 
a non-sparse Kt(v)x which in turn inhibits the convergence 
of both V and x or leads to incorrect reconstruction. 

Based on this observation, we propose to filter x resulting 
from solving (30), and solve v by replacing x with the filtered 
version x* in (31). To filter x, we again make use of the 
compressed sensing theory, which states that the noisy artifacts 
in the signal can be removed by non-linear filtering in an 
incoherent transform domain. The noise in x can be removed 
by solving 



X* = argmin/3||$tx||i + ||x — x| 



(33) 



where /3 can be adjusted at each iteration according to noise 
power in the sparse temporal transform domain defined by $f 
Provided that *t*t' = *t'*t ~ I, (33) can be minimized 
with a single step 



x*-*t'6(*tx,|) 



(34) 



where &{.) is an element- wise soft thresholding operator as 
defined earlier. $t must be selected as a sufficiently sparse 
orthonormal transform, a temporal transform such as temporal 
wavelets or DFT if applicable, in order to remove the noise 
at the expense of removing a part of the signal as well. The 
main advantage of using temporal wavelets or DFT as $t 
is that they separate the signal in the (temporal) frequency 
domain which usually leads to sparse high pass components 
for natural signals. As a result, the thresholding operation in 
this transform domain leads to a temporally blurred signal in 
the image domain, with blurring decreasing at every iteration 



as noise power and /3 gets smaller. This however is not a 
significant problem for global or local registration steps due 
to the following reasons: 

• Global registration parameters can still be estimated from 
temporally blurry image robustly even in the initial it- 
erations due to being a largely over-complete problem. 
In fact global registration can be performed only in the 
first iteration and not necessarily repeated again. Small 
inaccuracies in the estimation of global parameters can 
be compensated with local motion. 

• The local registration step solving (29) advances the 
vector field a step towards the motion direction at every 
iteration. Therefore the motion field is updated only 
considering large motion in the initial steps disregarding 
any small motion that is blurred out by the temporal 
blurring. As the iterations progress and the blurring is 
reduced, smaller vectors in the motion field can also 
be estimated. This is very similar to spatially scaling 
the images to provide multi-resolution registration and 
provides robustness towards correct estimation of the 
motion field. 

Consequently, assuming that the initial motion field is already 
corrected for global motion, the signal and the motion can be 
estimated jointly by iteratively applying 

{p, m,x} ^ argmin||p||i + ,ui||p - Kt(v)m - diHa 

p.m.x 



^2|Im-x-d2||2 
A^slly-Hx-s-dglla 



X* 4- argmin/3|l*ti||i + ||x - Si\\l 



(35) 
(36) 



V + arg niin Ai?(v + Av) 

Av 



Update s, di,d2,d3 



|Kt(v + Av)x*||2 (37) 
(38) 



with (3 — > exponentially as iterations progress. Note that 
the above iterations essentially solve the following problem: 

x,v =argmin ||Kt(v*)x||i + Ai?(v) + ||Kt(v)x*||2 



s.t. 



Hx||^ < e,v* 



V, X 



(39) 



The details of the algorithm to minimize (39) is given in 
Algorithm 3. Notice that with v and therefore Kt(v) being a 
variable in the optimization, the constrained and unconstrained 
formulations (such as in (4) and (6) respectively) are no longer 
equivalent, hence the constrained formulation in (39) is a 
necessity. 

IV. Experimental Results 

The presented algorithms are tested with cardiac MRI 
datasets acquired with multiple coils to improve reconstruction 
quality. The measurements in parallel MRI can be formulated 

as 

y = FCx + n = F[Cf •■•C?^J^x + n (40) 

where F is the spatial Fourier transform, Ci,i = 1,. . . ,Nc 
are the diagonal coil sensitivity matrices, y is the multi-coil 
measurement (k-space) data in time, n is the measurement 
noise and x is the image signal in time as defined earlier. In 



Algorithm 3 ADMM minimization to solve (39) 
L procedure Joint_MotionTV_ADMM(y , H, e) 

Minimizes ||p||i + Ai?(v) + ||Kt(v)x*||| 
s.t p = Kt(v*)ni, m = X, X* = X, V* : 

Hx||l<e 



V, 



Set ^i,/X2,Af3 > 0, did2,d3,s = 0,q: > 1 
Set X = H'y, m = x, v = v* = 0, /3 > 
while convergence criteria not met do 

p^ argmin||p||i + /ii||p - Kt(v*)m - dil|| 

= 6(Kt(v*)m + di,2i^) 



m<— argmin /ii||p — Kt(v*)ni — di 
™ +^2||m-x- d2||2 



— / B2. 



^I + Kt(v*)'Kt(v*)) ' [Kt(v*)'(p-di 



+ ^(x + d2) 
Oif ||y-Hx-d3||i<e 

7i(y-Hx-d3) .f||y_Hx-d3||i>e 

d,ii2 



|y— Hx— da 



X <— argmm /i2||m — x — U2||2 

+Ai3||y-Hx-s-d3||2 



^I + H'h) ^ [H'(y-s-d3) 
X* ^ argmin/3||*tx||i + ||x - xl^ 



= *t'©(*tx,|) 



10: 
IL 

12: 

13: 
14: 
15 
16: 
17: 
18: 



P^(3/c 



V ^ v+argminAi?(v+Av) + ||Kt(v+Av)x*l|^ 

Av 



di ^di - (p-Kt(v*)m) 

d2 ^ d2 - (m - x) 

d3 ^ d3 - (y - Hx - s) 

end while 

return x, v 
end procedure 



compressed sensing, the measurement is undersampled in a 
random fashion which can be represented as 



Hx + n = F„Cx + n = MFCx + n 



(41) 



where M is a subset of the rows of identity matrix and 
F„ = MF is the undersampled Fourier transform matrix. In 
cartesian MRI, undersampling is along the vertical and tempo- 
ral dimensions only, fully sampling the horizontal dimension. 
Experiments are performed on 2 cardiac MRI cine datasets, 
acquired in steady state, i.e. measured signal intensity does not 
change in time. Each dataset is a 2D slice of a 3D volume and 
acquired over the duration of a single heart beat of a patient 
in time. Details on the datasets can be summarized as follows: 




(a) Frames 1, 4, 7, 10, 12 of Dataset 1. The reconstruction results of the pixel locations indicated in frame 1 are shown in Figure 5a 




(b) Frames 1, 4, 9, 11, 14 of Dataset 2. The reconstruction results of the pixel locations indicated in frame 1 are shown in Figure 5b 
Fig. 2. Frames from fully sampled reconstructions of each dataset 



• Dataset 1: 2D cardiac MRI dataset acquired on a 3T 
Siemens Trio scanner using a 32-coil matrix body array. 
Fully sampled data were acquired using a 128 x 128 
matrix (FOV = 320 x 320 mm) and 22 temporal frames. 

• Dataset 2: 2D cardiac MRI dataset acquired on a 3T 
Siemens Trio scanner using a 9-coil matrix body array. 
Fully sampled data were acquired using a 256 x 256 
matrix (FOV = 320 x 320 mm) and 24 temporal frames. 

For both of the datasets, the fully sampled data is retro- 
spectively by discarding a random subset of the samples 
along the vertical axis in each temporal frame, with varying 
subsampling patterns in different frames. The distribution of 
the selected samples is uniform along the temporal axis for 
any given vertical position and Gaussian along the vertical 
axis centered on zero frequency coefficients in spatial Fourier 
transform domain. Experiments are performed at subsampling 
rates R=8,10,12,and 14 (R=8 denotes 1/8 of all the samples). 
The datasets are normalized so that the reconstruction of the 
fully sampled data leads to images with maximum intensity 
of 1. Sample frames from fully sampled reconstructions of the 
datasets can be seen in Figure 2. 

The subsampled datasets are reconstructed with Motion- 
TV both with separate and joint estimation of the motion the 
results of which are denoted by Motion-TV and Joint Motion- 
TV respectively in the figures. Note that at the experimented 
subsampling rates fully sampled reference frames to apply 
algorithms such as k-t FOCUSS are not applicable, therefore 
a comparison with these methods is not provided. Instead 
reconstruction results with temporal DFT and temporal TV 
regularization are presented for comparison. The difference of 
the first and last frame is included in both TV and Motion-TV 
formulations due to the datasets being periodic. Optimization 
with temporal DFT and temporal TV are accomplished using 
Algorithm 1 replacing Kt(v) with the proper operators. The 
steps involving matrix inversion in Algorithms 1 and 3 are 



calculated with Conjugate Gradient algorithm as suggested 
in [25]. The parameters /i^ and e in Algorithms 1 and 3 
are determined empirically through simulations. It is possible 
to analytically determine /3 and a through analysis on H, 
however due to significant size of H in the example problems, 
these parameters are also decided empirically as 2 and 1.09 
respectively. 

The datasets had no global motion but only local mo- 
tion which was especially significant around the heart area, 
therefore the region around the heart shown in Figure 2 is 
considered as the region of interest (ROI) when evaluating the 
reconstruction quality. Diffeo-symmetric Demon's registration 
algorithm with the recommended parameters in [28] is used 
as the deformable registration algorithm in order to estimate 
the motion in line 3 of Algorithm 2 as well as line 11 of 
Algorithm 3. The algorithm is applied without any multi- 
resolution scheme and is slightly modified to be applicable 
to complex valued datasets. The initial reconstruction step is 
accomplished with TV regularization for Motion-TV while <I>t 
is selected as temporal DFT for Joint Motion-TV. Bilinear 
interpolation is used for motion compensation in formulating 
Kt(v). 

The root mean squared error (RMSE) in the ROI of each 
frame with respect to fully sampled reconstructions are plotted 
in Figure 3 while the overall RMSE is listed in Tables I and 
II. It can be seen that the RMSE improvement with respect to 
TV reconstruction is small in Dataset 1 but is quite significant 
in Dataset 2. This can be explained by the fact that the 
magnitude of motion vectors are much smaller in Dataset 1 
and mostly not even greater than a single pixel (can be seen 
in Figure 6). Therefore the benefit of Motion-TV is limited. 
As the resolution and the scale of motion increase, the benefit 
of Motion-TV and Joint Motion-TV over TV and DFT can 
be seen more clearly in terms of RMSE in Figure 3b and 
Table II. The lower sparsity of temporal TV and DFT manifest 
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(b) Dataset 2 
Fig. 3. RMSE of the ROI in time for different datasets 



as spatial noise or temporal blurring which can be observed 
in the samples shown in Figure 4. 

The temporal change in the signal and its estimation ac- 
curacy is shown in Figures 5a and 5b for the sample pixels 
numbered in Figure 2 which are selected from most temporally 
varying areas in the Datasets. The temporal smoothing in DFT 
regularization can be very clearly seen in these pixel examples, 
although it is a commonly used sparsifying transform in 
compressed sensing dynamic MRI [8]. The TV regularization 
can perform better however still the staircase effect of TV is 
visible and the performance is mostly worse than Motion-TV 
and Joint Motion-TV which enforce sparsity in the temporal 
signal along the motion trajectory. 

In the presented experiments, the performance of Motion- 
TV was better than Joint Motion-TV in general. We believe 
this is due to the larger number of parameters to be adjusted 
in the Joint Motion-TV algorithm and a closer performance to 
Motion-TV is possible with better optimization of parameters. 



This claim is also supported by the resulting motion fields 
shown in in Figure 6, in which the field estimated by Joint 
Motion-TV can be observed to be at least as consistent as the 
field estimated by Motion-TV. In fact the motion field is more 
continuous along the tissue boundaries (edges) thanks to the 
multi-resolution effect during the estimation of motion field as 
discussed in Section III-B. Also note that when much larger 
motion is involved, the quality of the initial reconstruction in 
Motion-TV would suffer from severe artifacts or noise and 
the motion estimation step might therefore fail to estimate 
an accurate motion field. In these cases further iterations 
between motion estimation and signal reconstruction can be 
used which in turn leads to very long reconstruction time. 
The joint estimation however gradually estimates the motion 
field and is more robust and preferable in high motion cases 
for its efficient computation. 

Each of the separate and joint Motion-TV has benefits with 
regards to the other which can be summarized as follows: 

• Separate Optimization: Each of the optimization steps 
(initial estimation, motion estimation, final estimation) 
can be performed independently from each other with 
separate algorithms and/or implementations if necessary. 
This provides ease of implementation as well as robust- 
ness. It is also possible to correct the possible errors in the 
initial reconstruction or motion estimation steps before 
the final estimation (noise smoothed out or motion vectors 
manually corrected). The resulting reconstruction quality 
suffers if the initial reconstruction is not accurate enough 
for motion estimation. The overall reconstruction time is 
larger (close to twice with our implementation) than joint 
optimization case 

• Joint Optimization: Efficient in terms of execution time 
and motion estimation although some extra parameters 
need to be determined (/?, a) to ensure convergence and 
speed. The performance does not depend significantly 
on the quality of initial reconstructions since the signal 
and the motion are gradually estimated in joint iterations. 
Therefore joint reconstruction is preferable over separate 
reconstruction in case of signals with very large motion. 

In terms of algorithm complexity, thanks to Kt (v) being a 
very sparse matrix, line 5 of Algorithm 1 can be efficiently 
calculated using an algorithm such as conjugate gradient 
method without a significant overhead (less than 10%) on 
the total execution time of the algorithm when compared 
with simpler transforms such as temporal TV and temporal 
DFT. This is also as a result of having a very large H, 
and the operations on H being the dominant factor in the 
speed of the minimization. The speed of convergence is also 
observed to be similar. For Joint Motion-TV, the registration 
step (line 1 1 of Algorithm 3) adds around 20% increase in the 
total execution time in our implementation, while the speed 
of convergence in terms of number of iterations is similar to 
the other algorithms. For all minimizations, 100 iterations are 
observed to be sufficient for convergence. 

V. Conclusion 

A new regularization and reconstruction scheme is proposed 
for motion compensated compressed sensing to be used in 




(a) Dataset 1 




(b) Dataset 2 

Fig. 4. Reconstructed samples from all datasets (frame 4). In all figures, from left to right, fully sampled, DFT, TV, Motion-TV and Joint Motion-TV 
reconstruction is shown. First row is the reconstruction at subsampling rate R=8 and the second row is at rate R=14. 



TABLE I 
RMSE FOR ALL FRAMES OF DATASET 1 





R=8 


R=10 


R=12 


R=14 


DFT 


0.0220 


0.0239 


0.0271 


0.0286 


TV 


0.0146 


0.0168 


0.0187 


0.0199 


Motion-TV 


0.0139 


0.0158 


0.0177 


0.0187 


Joint Motion-TV 


0.0143 


0.0164 


0.0183 


0.0194 



TABLE II 

RMSE FOR ALL FRAMES OF DATASET 2 





R=8 


R=10 


R=12 


R=14 


DFT 


0.0303 


0.0379 


0.0410 


0.0485 


TV 


0.0253 


0.0265 


0.0288 


0.0305 


Motion-TV 


0.0197 


0.0214 


0.0237 


0.0260 


Joint Motion-TV 


0.0219 


0.0235 


0.0256 


0.0278 



reconstruction of images or volumes of moving objects in time. 
The proposed method can be used in scenarios that earlier 
motion compensation schemes are not applicable such as 
when no reference frames are present and outperforms known 
regularization methods without motion compensation when the 
motion is significant. In addition to improved regularization 
with a known motion field, it is also shown that, although not 
straightforward, it is possible to jointly estimate the motion and 
the signal efficiently by utilizing the principles of compressed 
sensing. 



While the provided experiment results demonstrate the 
performance in cardiac MRI reconstruction, the proposed 
algorithms are by no means limited to this application and can 
be used in applications such as reconstruction of free breathing 
medical imaging data or correction of patient movement during 
compressed sensing reconstruction. The proposed algorithms 
can also be improved by making use of better interpolation 
methods or bi-directional motion estimation therefore reducing 
estimation error and increasing sparsity which will be inves- 
tigated as a future work. 
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